Comparing General and Medical Texts for Information Retrieval Based on Natural Language Processing: An Inquiry into Lexical Disambiguation

نویسندگان

  • Patrick Ruch
  • Robert H. Baud
  • Antoine Geissbühler
  • Anne-Marie Rassinoux
چکیده

In this paper we compare two types of corpus, focusing on the lexical ambiguity of each of them. The first corpus consists mainly of general newspaper articles and literature excerpts, while the second belongs to the medical domain. To conduct the study, we have used two different disambiguation tools. First, each tool was validated in its respective application area. We then use these systems in order to assess and compare both the general ambiguity rate and the particularities of each domain. Quantitative results show that medical documents are lexically less ambiguous than unrestricted documents. Our conclusions emphasize the importance of the application area in the design of NLP tools.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluating Wordnets in Cross-language Information Retrieval: the ITEM Search Engine

This paper presents the ITEM multilingual search engine. This search engine performs full lexical processing (morphological analysis, tagging and Word Sense Disambiguation) on documents and queries in order to provide language-neutral indexes for querying and retrieval. The indexing terms are the EuroWordNet/ITEM InterLingual Index records that link wordnets in 10 languages of the European Comm...

متن کامل

Roget's Thesaurus as a Lexical Resource for Natural Language Processing

WordNet proved that it is possible to construct a large-scale electronic lexical database on the principles of lexical semantics. It has been accepted and used extensively by computational linguists ever since it was released. Some of its applications include information retrieval, language generation, question answering, text categorization, text classification and word sense disambiguation. I...

متن کامل

Comparing Corpora And Lexical Ambiguity

In this paper we compare two types of corpus, focusing on the lexical mnbiguity of each of them. The first corpns consists mainly of newspaper articles and Hterature excerpts, while the second belc)ngs to the medical domain. To conduct the study, we have used two different disambiguation tools. However, first of all, we must verify the performance of each system in its respective application do...

متن کامل

Evaluating Resources for Query Translation in Cross-Language Information Retrieval

Our goal is to evaluate the utility of a lexical resource containing Lexical Conceptual Structures LCS for use in cross language information retrieval Our evaluation makes use of a combination of techniques from interlingual machine translation Dorr with conventional information retrieval techniques Oard OardandDorr Given a query in one language we transform the query into the corresponding ter...

متن کامل

Word Sense Clustering Based on Translation Equivalence in Parallel Texts; a Case Study in Romanian

The lexical ambiguity is one of the most difficult problem to solve in the natural language processing. The Word sense disambiguation (wsd) is a task of utmost importance and as such, it is not surprinsing the great interest for this area of research. We argue that the difficulty of solving the problem of sense disambiguation depends to a large extent on the application which requires a solutio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Studies in health technology and informatics

دوره 84 Pt 1  شماره 

صفحات  -

تاریخ انتشار 2001